1 A Data Mining & Knowledge Discovery Process Model
نویسندگان
چکیده
The number of applied in the data mining and knowledge discovery (DM & KD) projects has increased enormously over the past few years (Jaffarian et al., 2008) (Kdnuggets.com, 2007c). As DM & KD development projects became more complex, a number of problems emerged: continuous project planning delays, low productivity and failure to meet user expectations. Neither all the project results are useful (Kdnuggets.com, 2008) (Eisenfeld et al., 2003a) (Eisenfeld et al., 2003b) (Zornes, 2003), nor do all projects end successfully (McMurchy, 2008) (Kdnuggets.com, 2008) (Strand, 2000) (Edelstein & Edelstein, 1997). Today’s failure rate is over 50% (Kdnuggets.com, 2008) (Gartner, 2005) (Gondar, 2005). This situation is in a sense comparable to the circumstances surrounding the software industry in the late 1960s. This was what led to the ’software crisis’ (Naur & Randell, 1969). Software development improved considerably as a result of the new methodologies. This solved some of its earlier problems, and little by little software development grew to be a branch of engineering. This shift has meant that project management and quality assurance problems are being solved. Additionally, it is helping to increase productivity and improve software maintenance. The history of DM & KD is not much different. In the early 1990s, when the KDD (Knowledge Discovery in Databases) processing term was first coined (Piatetsky-Shapiro & Frawley, 1991), there was a rush to develop DM algorithms that were capable of solving all the problems of searching for knowledge in data. Apart from developing algorithms, tools were also developed to simplify the application of DM algorithms. From the viewpoint of DM & KD process models, the year 2000 marked the most important milestone: CRISP-DM (CRoss-Industry Standard Process for DM) was published (Chapman et al., 2003). CRISPDM is the most used methodology for developing DM & KD projects. It is actually a “de facto” standard. Looking at the KDD process and how it has progressed, we find that there is some parallelism with the advancement of software. From this viewpoint, DM project development entails defining development methodologies to be able to cope with the new project types, domains and applications that organizations have to come to terms with. Nowadays, SE (software engineering) pay special attention to organizational, management or other parallel activities not directly related to development, such as project completeness O pe n A cc es s D at ab as e w w w .in te ch w eb .o rg
منابع مشابه
Knowledge discovery from patients’ behavior via clustering-classification algorithms based on weighted eRFM and CLV model: An empirical study in public health care services
The rapid growing of information technology (IT) motivates and makes competitive advantages in health care industry. Nowadays, many hospitals try to build a successful customer relationship management (CRM) to recognize target and potential patients, increase patient loyalty and satisfaction and finally maximize their profitability. Many hospitals have large data warehouses containing customer ...
متن کاملData Mining: A Novel Outlook to Explore Knowledge in Health and Medical Sciences
Today medical and Healthcare industry generate loads of diverse data about patients, disease diagnosis, prognosis, management, hospitals’ resources, electronic patient health records, medical devices and etc. Using the most efficient processing and analyzing method for knowledge extraction is a key point to cost-saving in clinical decision making. Data mining, sometimes called data or knowledge...
متن کاملKnowledge discovery from patients’ behavior via clustering-classification algorithms based on weighted eRFM and CLV model: An empirical study in public health care services
The rapid growing of information technology (IT) motivates and makes competitive advantages in health care industry. Nowadays, many hospitals try to build a successful customer relationship management (CRM) to recognize target and potential patients, increase patient loyalty and satisfaction and finally maximize their profitability. Many hospitals have large data warehouses containing customer ...
متن کاملApplication of Rough Set Theory in Data Mining for Decision Support Systems (DSSs)
Decision support systems (DSSs) are prevalent information systems for decision making in many competitive business environments. In a DSS, decision making process is intimately related to some factors which determine the quality of information systems and their related products. Traditional approaches to data analysis usually cannot be implemented in sophisticated Companies, where managers ne...
متن کاملA Probabilistic Bayesian Classifier Approach for Breast Cancer Diagnosis and Prognosis
Basically, medical diagnosis problems are the most effective component of treatment policies. Recently, significant advances have been formed in medical diagnosis fields using data mining techniques. Data mining or Knowledge Discovery is searching large databases to discover patterns and evaluate the probability of next occurrences. In this paper, Bayesian Classifier is used as a Non-linear dat...
متن کاملA Probabilistic Bayesian Classifier Approach for Breast Cancer Diagnosis and Prognosis
Basically, medical diagnosis problems are the most effective component of treatment policies. Recently, significant advances have been formed in medical diagnosis fields using data mining techniques. Data mining or Knowledge Discovery is searching large databases to discover patterns and evaluate the probability of next occurrences. In this paper, Bayesian Classifier is used as a Non-linear dat...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2012